A Buuering Strategy to Avoid Ordering Eeects in Clustering
نویسندگان
چکیده
It is widely reported in the literature that incremental clustering systems suuer from instance ordering eeects and that under some orderings, extremely poor clusterings may be obtained. In this paper we present a new strategy aimed to mitigate these eeects, the Not-Yet strategy which has a general and open formulation and it is not coupled to any particular system. Results suggest that the strategy improves the clustering quality and also that performance is limited by its limited foresight. We also show that, when combined with other strategies, the Not-Yet strategy may help the system to get high quality clusterings.
منابع مشابه
Not-yet: a Local Strategy to Avoid Ordering Eeects in Clustering
It is widely reported in the literature that incremental clustering systems suuer from instance ordering eeects and that under some orderings extremely poor clusterings may be obtained. In this paper we present a new general strategy aimed to mitigate these efects, the Not-Yet strategy, which has a general and open formulation and it is not coupled to any particular system. In addition, we prop...
متن کاملRobust incremental clustering with bad instanceorderings : a new
It is widely reported in the literature that incremental clustering systems suuer from instance ordering eeects and that under some orderings, extremely poor clusterings may be obtained. In this paper we present a new general strategy aimed to mitigate these eeects, the Not-Yet strategy which has a general and open formulation and it is not coupled to any particular system. Unlike other proposa...
متن کاملRepeated Record Ordering for Constrained Size Clustering
One of the main techniques used in data mining is data clustering, which has many applications in computer science, biology, and social sciences. Constrained clustering is a type of clustering in which side information provided by the user is incorporated into current clustering algorithms. One of the well researched constrained clustering algorithms is called microaggregation. In a microaggreg...
متن کاملResequencing and Clustering to Improve the Performance of Spatial Joins
The lter-and-reene strategy is well-established as the basis for spatial join algorithms. In contrast to the lter step, the reenement step has received little attention, despite contributing signiicantly to the total cost of a join evaluation. Sorting candidate tuples as produced by the lter step has recently been shown to reduce the I/O cost of reenement. Our paper reports investigations of sp...
متن کاملSpeeding up External Mergesort
External mergesort is normally implemented so that each run is stored contiguously on disk and blocks of data are read exactly in the order they are needed during merging. We investigate two ideas for improving the performance of external mergesort: interleaved layout and a new reading strategy. Interleaved layout places blocks from diierent runs in consecutive disk addresses. This is done in t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998